[BUG] Fix actor pool project splitting when column is not renamed #2998

kevinzwang · 2024-10-04T20:21:30Z

Previously, this would fail:

import os

os.environ["DAFT_ENABLE_ACTOR_POOL_PROJECTIONS"] = "1"

import daft
from daft import udf

@udf(
    return_dtype=daft.DataType.int64(),
    batch_size=1
)
class MyUDF:
    def __init__(self):
        # import time
        # time.sleep(10)
        pass

    def __call__(self, _):
        
        # import time
        # time.sleep(10)

        import os

        pid = os.getpid()
        return [pid]

MyUDF = MyUDF.with_concurrency(4)

df = daft.from_pydict({"a": list(range(10))})
df = df.into_partitions(4)
df = df.select(MyUDF(df["a"]))
df = df.select(MyUDF(df["a"]))
df.show()

This is because when we split the project into multiple actor pool projects, we create new names for intermediate columns and lose the information about the original name. This PR fixes that by adding an alias to the end of the actor pool projects.

codspeed-hq · 2024-10-04T20:34:57Z

CodSpeed Performance Report

Merging #2998 will not alter performance

_{Comparing kevin/split-actor-pool-alias (e558ac7) with main (cd59c73)}

Summary

✅ 17 untouched benchmarks

codecov · 2024-10-04T21:14:49Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.12%. Comparing base (a62d276) to head (e558ac7).
Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2998      +/-   ##
==========================================
+ Coverage   77.80%   78.12%   +0.31%     
==========================================
  Files         602      602              
  Lines       71892    71461     -431     
==========================================
- Hits        55938    55830     -108     
+ Misses      15954    15631     -323

Files with missing lines	Coverage Δ
...al_optimization/rules/split_actor_pool_projects.rs	`95.15% <100.00%> (+0.39%)`	⬆️

... and 18 files with indirect coverage changes

jaychia

Awesome

jaychia · 2024-10-05T01:42:05Z

src/daft-plan/src/logical_optimization/rules/split_actor_pool_projects.rs

    recursive_count: usize,
 ) -> DaftResult<Transformed<Arc<LogicalPlan>>> {
+    // TODO: eliminate the need for recursive calls by doing a post-order traversal of the plan tree.


Excellent :)

src/daft-plan/src/logical_optimization/rules/split_actor_pool_projects.rs

Co-authored-by: Jay Chia <[email protected]>

@udf

…entual-Inc#2998) Previously, this would fail: ```py import os os.environ["DAFT_ENABLE_ACTOR_POOL_PROJECTIONS"] = "1" import daft from daft import udf @udf( return_dtype=daft.DataType.int64(), batch_size=1 ) class MyUDF: def __init__(self): # import time # time.sleep(10) pass def __call__(self, _): # import time # time.sleep(10) import os pid = os.getpid() return [pid] MyUDF = MyUDF.with_concurrency(4) df = daft.from_pydict({"a": list(range(10))}) df = df.into_partitions(4) df = df.select(MyUDF(df["a"])) df = df.select(MyUDF(df["a"])) df.show() ``` This is because when we split the project into multiple actor pool projects, we create new names for intermediate columns and lose the information about the original name. This PR fixes that by adding an alias to the end of the actor pool projects. --------- Co-authored-by: Jay Chia <[email protected]>

[BUG] Fix actor pool project splitting when column is not renamed

aabd04e

kevinzwang requested a review from jaychia October 4, 2024 20:21

github-actions bot added the bug Something isn't working label Oct 4, 2024

kevinzwang added 2 commits October 4, 2024 13:56

make alias condition more restrictive

6d6490b

make wording better

520a04c

jaychia approved these changes Oct 5, 2024

View reviewed changes

Delete println

56f5305

Co-authored-by: Jay Chia <[email protected]>

kevinzwang enabled auto-merge (squash) October 5, 2024 01:47

format

e558ac7

kevinzwang merged commit 53a84ea into main Oct 5, 2024
40 checks passed

kevinzwang deleted the kevin/split-actor-pool-alias branch October 5, 2024 03:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Fix actor pool project splitting when column is not renamed #2998

[BUG] Fix actor pool project splitting when column is not renamed #2998

kevinzwang commented Oct 4, 2024

codspeed-hq bot commented Oct 4, 2024 •

edited

Loading

codecov bot commented Oct 4, 2024 •

edited

Loading

jaychia left a comment

jaychia Oct 5, 2024

[BUG] Fix actor pool project splitting when column is not renamed #2998

[BUG] Fix actor pool project splitting when column is not renamed #2998

Conversation

kevinzwang commented Oct 4, 2024

codspeed-hq bot commented Oct 4, 2024 • edited Loading

CodSpeed Performance Report

Merging #2998 will not alter performance

Summary

codecov bot commented Oct 4, 2024 • edited Loading

Codecov Report

jaychia left a comment

Choose a reason for hiding this comment

jaychia Oct 5, 2024

Choose a reason for hiding this comment

codspeed-hq bot commented Oct 4, 2024 •

edited

Loading

codecov bot commented Oct 4, 2024 •

edited

Loading